CONVERGENCE OF TOMLIN'S HOTS ALGORITHM 



OLIVIER FERCOQ * 

Abstract. The HOTS algorithm uses the hyperlink structure of the web to compute a vector 
of scores with which one can rank web pages. The HOTS vector is the vector of the exponentials of 
the dual variables of an optimal flow problem (the "temperature" of each page) . The flow represents 
an optimal distribution of web surfers on the web graph in the sense of entropy maximization. 

In this paper, we prove the convergence of Tomlin's HOTS algorithm. We first study a simpli- 
fied version of the algorithm, which is a fixed point scaling algorithm designed to solve the matrix 
balancing problem for nonnegative irreducible matrices. The proof of convergence is general (non- 
linear Perron- Frobenius theory) and applies to a family of deformations of HOTS. Then, we address 
the effective HOTS algorithm, designed by Tomlin for the ranking of web pages. The model is a 
network entropy maximization problem generalizing matrix balancing. We show that, under mild 
assumptions, the HOTS algorithm converges with a linear convergence rate. The proof relies on a 
uniqueness property of the fixed point and on the existence of a Lyapunov function. 

We also show that the coordinate descent algorithm can be used to find the ideal and effec- 
tive HOTS vectors and we compare HOTS and coordinate descent on fragments of the web graph. 
Our numerical experiments suggest that the convergence rate of the HOTS algorithm may deteri- 
orate when the size of the input increases. We thus give a normalized version of HOTS with an 
experimentally better convergence rate. 

1. Introduction. Internet search engines use a variety of algorithms to sort web 
pages based on their text content or on the hyperlink structure of the web. In this 
paper, we focus on an algorithm proposed by Tomlin in [31] for the ranking of web 
pages, called HOTS. It may also be used for other purposes like the ranking of sport 
teams [15]. Like PageRank g], HITS [T7] and SALSA [25], HOTS uses the hyperlink 
structure of the web (see also [50] [5T] for surveys on link-based ranking algorithms) . 
This structure is summarized in the web graph, which is a digraph with a node for 
each web page and an arc between pages i and j if there is a hyperlink from page i 
to page j. 

The HOTS vector, used to rank web pages, is the vector of the exponentials 
of the dual variables of an optimal flow problem. The flow represents an optimal 
distribution of web surfers on the web graph in the sense of entropy maximization. 
The dual variable, one by page, is interpreted as the "temperature" of the page, the 
hotter a page the better. In the case of the PageRank, the flow of websurfers is 
determined by the uniform transition probability of following one hyperlink in the 
current page. This transition rule is in fact arbitrary. The HOTS model assumes that 
the web surfers choose the hyperlink to follow by maximizing the entropy of the flow. 
Tomlin showed that this vector is solution of a nonlinear fixed point equation. He 
then proposed a scaling algorithm to compute the HOTS vector, based on this fixed 
point equation. 

This algorithm solves the matrix balancing problem studied among others in [131 
[5] [5J5] [55] . Given anxn nonnegative matrix A, the matrix balancing problem consists 
in finding a matrix X of the form X = D~ 1 AD with D diagonal definite positive and 
such that J2k Xi,k = J2j for all i. We shall compare Tomlin's HOTS algorithm 
with Schneider and Zenios's coordinate descent DSS algorithm 29 . The main differ- 
ence between these algorithms is that in coordinate descent, the scaling is done node 
by node in the network (i.e. in a Gauss-Seidel fashion) whereas in Tomlin's HOTS 
algorithm, the scaling is done all the nodes at the same time, in a Jacobi fashion. 
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A problem close to the matrix balancing problem is the equivalence scaling prob- 
lem, where given a m x n nonnegative matrix A, we search for a matrix X of the 
form X = D1AD2 with D\ and D2 diagonal definite positive and such that X is 
bistochastic. The Sinkhorn-Knopp |19) algorithm is a famous algorithm designed for 
the resolution of the scaling problem. We may see HOTS algorithm as the analog of 
Sinkhorn-Knopp algorithm for the matrix balancing problem: both algorithms cor- 
respond to fixed point iterations on the diagonal scalings. Moreover, Smith |30j and 
Knight [IB] proposed to rank web pages according to the inverse of the corresponding 
entry in the diagonal scaling. 

However, whereas Sinkhorn-Knopp algorithm [19] and the coordinate descent 
algorithm [23] have been proved to converge, it does not seem that a theoretical result 
on the convergence of Tomlin's HOTS algorithm has been stated in previous works, 
although experimentations 31 suggest that it is the case. Indeed, Knight (T8J Sec. 5] 
rose the fact that Tomlin did not state any convergence result for HOTS algorithm. 
Another algorithm for the matrix balancing problem is given in [14j . based on the 
equivalence between the matrix balancing problem and the problem of minimizing 
the dominant eigenvalue of an essentially nonnegative matrix under trace-preserving 
diagonal perturbations [15] , 

In this paper, we prove the convergence of Tomlin's HOTS algorithm. We first 
study a simplified version of the algorithm that we call the ideal HOTS algorithm. It 
is a fixed point scaling algorithm that solves the matrix balancing problem for non- 
negative irreducible matrices. We prove its convergence thanks to nonlinear Perron- 
Frobenius theory (Theorem l3.6j) . The proof methods are general and apply to a family 
of deformations of HOTS. Then, we address the effective HOTS algorithm, for the 
general case, which is the version designed by Tomlin for the ranking of web pages. In- 
deed the web graph is not strongly connected, which implies that the balanced matrix 
does not necessarily exist. The model is a nonlinear network entropy maximization 
problem which generalizes matrix balancing. We show in Theorem 14.61 that under 
mild assumptions the HOTS algorithm converges with a linear rate of convergence. 
The proof relies on the properties of the ideal HOTS algorithm: uniqueness of the 
fixed point up to an additive constant and decrease of a Lyapunov function at every 
step (Theorem [31]). 

We also show that Schneider and Zenios's coordinate descent algorithm can be 
adapted to find the ideal and effective HOTS vectors. We compare the HOTS al- 
gorithm and coordinate descent on fragments of the web graph in Section [5] We 
considered small, medium and large size problems. In all cases the respective com- 
putational costs of both algorithms were similar. As the performances of the HOTS 
algorithm depends on the primitivity of the adjacency matrix considered and coordi- 
nate descent does not, coordinate descent can be thought to have a wider range of 
applications. However, the actual implementation of the HOTS algorithm is attrac- 
tive for web scale problems: whereas coordinate descent DSS uses at each iteration 
(corresponding to a given web page) information from incoming and outgoing hy- 
perlinks, the HOTS algorithm reduces to elementwise operations and left and right 
matrix vector products. Hence, an iteration of the HOTS algorithm can be performed 
without computing neither storing the transpose of the adjacency matrix. 

We give an exact coordinate descent algorithm for the truncated scaling problem 
defined in [27] and we extend its use to the problem of computing the HOTS vector 
when some bounds on the web surfers flow are known. Experimental results show that 
exact coordinate descent is an efficient algorithm for web scale problems and that it 
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is faster than the inexact coordinate descent algorithm presented in |28j . Finally, 
we remarked that the convergence rate of the effective HOTS algorithm seems to 
deteriorate when the size of the graph considered increases. In order to overcome this 
feature, we propose a normalized version of the HOTS algorithm where we maximize a 
relative entropy of the flow of web surfers instead of the classical entropy. A byproduct 
is that the associated ranking favorizes pages with no outlink less than Tomlin's 
HOTS. 

The paper is organized as follows. In Section [2j we recall the main theorems of 
nonlinear Perron-Frobenius theory, in Section[3j we prove the convergence of the ideal 
HOTS algorithm and we give a Lyapunov function for this algorithms. In Section 21 
we give the convergence rate of the effective HOTS algorithm. In Section [5j we study 
the HOTS problem with bounds on the flow of web surfers. In Section [6j we compare 
various candidate algorithms to compute the HOTS vector and in Section [3 we give 
the normalized HOTS algorithm. 

2. Nonlinear Perron-Frobenius theory. The classical Perron-Frobenius the- 
orem (see [2] for instance) states that the spectral radius of a nonnegative matrix A is 
an eigenvalue (called the Perron root) and that there exists an associated eigenvector 
with nonnegative coordinates. If, in addition, A is irreducible, then the Perron root 
is simple and the (unique up to a multiplicative constant) nonnegative eigenvector, 
called the Perron vector, has only positive entries. The nonlinear Perron-Frobenius 
theory is an extension of the Perron-Frobenius theorem to monotone and homogeneous 
maps. It has a multiplicative and an additive formulation. 

Definition 2.1. A map T : M" — > R™ is monotone if for all vectors p, q such 
that p < q, T(p) < T(q). A map T : K™ — > K™ is homogeneous if for all vector p and 
for all nonnegative real X, T(\p) = XT(p). 

Definition 2.2. A map T : M™ — > M" is additively homogeneous if for all vector 
p and for all real A, T(A + p) = A + T(p) . 

We can transform a multiplicative monotone, homogeneous map T x into a mono- 
tone, additively homogeneous map T + and vice versa by the following operation called 
the "logarithmic glasses" : 

r+(p) = log(T x (exp(p))) 

where log and exp act elementwise. 

The following results show that monotone and nonexpansive maps are indeed 
nonexpansive. Hence, they are well suited for iterative algorithms. 

Proposition 2.3 ([7J. An additively homogeneous map is nonexpansive for the 
sup-norm if and only if it is monotone. 

For a more general result, we shall need Hilbert's projective metric. 

Definition 2.4. For x,y two vectors ofW 1 , Hilbert's projective metric between 
x and y is defined as 

d(x, y) = log( max — — ) 

i,je[n] yiXj 

Proposition 2.5 ([5 ]). Any monotone and homogeneous map is nonexpansive 
for Hilbert 's metric. 

Definition 2.6. For a map T : W 1 -> W 1 or T : ^ R\, we call the graph of 
T and we denote it G(T), the directed graph with nodes 1, . . . , n and an arc from i to 
j if and only if lim^+oo T(te^) = +oo where ej is the ith basis vector. 
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The following results give conditions for the existence and uniqueness of the 
"eigenvector" of a monotone, (additivcly or multiplicatively) homogeneous map. 

Theorem 2.7 (Theorem 2 in [11]). LetT be a monotone, additively homogeneous 
map. If G(T) is strongly connected, then there exists u G K™ and A £ R such that 
T{u) — A + u. We say that u is an additive eigenvector of f. 

Theorem 2.8 (Corollary 2.5 in [24], Theorem 2.3 in [TO], Theorem 6.8 in t l ). Let 
T : R n — > R n be a continuously differentiate map which is monotone and additively 
homogeneous and has an additive eigenvector p E R™. // VT(p) is irreducible, then 
the eigenvector is unique up to an additive factor. If VT(p) is primitive, then all the 
orbits defined by 

Pk+i = T(p k ) ~ ijj(T(p k )) 

for a given additively homogeneous function i\> : 1™ — > R converge to p — ip(p) linearly 
at a rate equal to A2(VT(p))| = max{|A|; A € spectrum(VT(p)), A ^ 1}. 

These theorems have been stated with more general assumptions, among others 
semi-differentiability [1] and infinite state space [24]. However, for the sake of sim- 
plicity, we present them here in this simpler form. We can also write them theorem 
in the multiplicative form. 

These results give a general framework to prove that the HOTS score and more 
generally many web rankings are well defined, i.e. that the score is unique and that the 
fixed point algorithm (or power algorithm) used to compute them indeed converges 
to the expected ranking. 

3. The ideal HOTS algorithm. The web graph is a graph constructed from 
the hyperlink structure of the web. Each web page is represented by a node and there 
is an arc between nodes i and j if and only if page i points to page j. We shall denote 
by A the adjacency matrix of the web graph. 

There are two versions of the HOTS algorithm: an ideal version for strongly 
connected graphs, i.e. for irreducible adjacency matrices, and an effective version for 
general graphs that we will study in Section 21 The HOTS algorithm for irreducible 
matrices is designed for the resolution of the following nonlinear network flow problem. 
The optimization variable p$ j represents the traffic of websurfcrs on the hyperlink 
from page i to page j. 

o>Q A — ' J\j a 

i,j£[n] l ' 3 

Pi,j = Y Pi<* ' V * e M (Pi) 
je[n] je[n] 

Y Pl 3 = 1 M 

The dual problem consists in minimizing the function 9 on R n x R where 

i,je[n] 

We use the convention that 01og(0) = and that a;log(a;/0) = if x = and 
xlog(ir/0) = +oo otherwise. 

If (p, fi) is a minimizer of 9, then the value of exp(pi) is interpreted as the tem- 
perature of page i, the hotter the better. We call it the HOTS (Hyperlinked Object 
Temperature Scale) score. 
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The ideal HOTS algorithm (Algorithm [lj reduces to the fixed point iterations for 
the function / defined by 

f(x) = ~(log(A T e*)-log(Ae-*)) . (3.1) 

Denoting yi — e Pi , we can write it in multiplicative form to spare computing the 
exponentials and logarithms. 



Algorithm 1 Ideal HOTS algorithm [31 

Start with an initial point yo £ W l , yo > 0. Given y k , compute y k+1 such that 

fc+i = ( £je[»] A 3,iVj \ 2 



Algorithm 2 Coordinate descent DSS [29] 

Start with an initial point y° £ M. n , yo > 0. Given y k , select a coordinate i £ [n] and 
compute y k+1 such that 

i 

fc+i = ( T,je[n] A J^yj \ 2 
Vl VE Ie[w] Ai(v?)-V 
y- +1 =y" , 



We shall compare the HOTS algorithm with Schneider and Zenios's coordinate 
descent DSS algorithm (Algorithm [2} . This is indeed a coordinate descent algorithm 
since for every k, we have, denoting pi — log (?/,), 

P k+1 = a,rgmm6(p k ,...,p k _ 1 ,x,p k +1 ,...,p k ) . 

Coordinate descent algorithms (Algorithm are designed to solve 

min</>(x) (3.2) 

where X is a possibly unbounded box of E™ and cj> has the form <p(x) = ip(Ex) + (b, x), 
ip is a proper closed convex function, E is a m x n matrix having no zero row and b 
is a vector of M™ . 

Algorithm 3 Coordinate descent 

Start with an initial point x° G R". Given x fe , select a coordinate i £ [n] and compute 

x k+l SUC J 1 

=arg min 0(ac{, . . . , x^_ x , y, x k +1 , . . . , x k ) 
x k+1 =x k , Vjytt 
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Proposition 3.1 ([33]). Assume that the set of optimal solutions X* of (|3.2p 
is nonempty, that the domain of i\) is open, that ip is twice continuously differentiate 
on its domain and that V 2 ip(Ex) is positive definite for all x G X* . Let (x )k be a 
sequence generated by the coordinate descent algorithm (Algorithm^ , using the cyclic 
rule (more general rules are also possible). Then (x k )k converges at least linearly to 
an element of X* . 

We now study the fixed point operator / denned in (|3.1J . 

Proposition 3.2 ([3T])- A vector p G M™ is a fixed point of f defined in (13 . 1 [) 
if and only if the couple (p, fi) with /j = — logQ^ je[n] Aije Pi ~ Pj ) is a minimizer of 
the dual function. Moreover, in this case, denoting D = diag(exp(p)) ; e^DAD^ 1 is 
a maximizer of the network flow problem. 

Proof. As 9 is convex and differentiable, a couple (p, fi) is a minimizer if and 
only if it cancels the gradient, ^(p, fi) — J2i je[ n ] Aije Pi ~ Pj e M — 1, so we have the 
expression of the optimal /i as a function of p. To conclude, we remark that 

Pk \ ie[n] Mn] J 

is equivalent to f(p) = p. To get back to the primal problem, we remark that the 
primal cost of e^DAD^ 1 is equal to the dual cost of (p, /i) and that it is an admissible 
circulation. □ 

Proposition 3.3. The map f defined in (|3.1I) is monotone, additively homoge- 
neous fDefinition \2.2\) . 

Proof. For all real A and for all vectors p, q such that p < q, f(X +p) = A + f(p) 
(log(e A ) = A) and f{p) < f(q) (log and exp are increasing functions). □ 

The following result gives the conditions for the existence and uniqueness of the 
ideal HOTS vector. 

Theorem 3.4 ([H). There exists v G M™ such that f(v) — v and J2ie[n] Vl = */ 
and only if A has a diagonal similarity scaling if and only if A is completely reducible. 
If in addition A is irreducible, then this vector is unique. 

Corollary 3.5 (29). If A is completely reducible, coordinate descent DSS 
(Algorithm^) converges linearly to a vector v such that diag(u)yldiag(?;) _1 is scaled. 

To prove the convergence of the ideal HOTS algorithm (Algorithm [TJ, we use the 
nonlinear Perron- Frobenius theory, the main theorems of which are stated in Section[2j 

Theorem 3.6. Let f be the map defined in (|3.1[) . If A is irreducible and A + A T 
is primitive, then there exists a vector v and such that f(v) = v and for all x G K™, 

limsup||/ fe+1 (x) - v\\ 1/k < |A 2 (P)| = max{|A|; A G spectrum(P), A ^ 1} 

k— >QC 

where P = | (dia,g(A T e v )~ 1 A T diag(e") + diag(Ae~ t ')~ 1 Adiag(e~ t ')) . In particular, 
the ideal HOTS algorithm (Algorithm^ converges linearly at rate \\2(P)\. 

Proof. The iterates of the fixed point iteration defined by p° = x and p k+1 = 
f(p k ) verify p k — log(y fc ) where y k is the kth. iterate of the ideal HOTS algorithm 
(Algorithm [T]) started with y° = exp(x). Hence, by continuous differentiability of the 
exponential, the rate of convergence of both versions of the algorithm is the same. By 
Theorem 13. 4[ as A is irreducible, / has a fixed point v and diag(exp(u)) is solution 
of the matrix balancing problem associated to A. Now easy calculations show that 
V/ = P. As P has the same pattern as A + A T , P is primitive if and only if A + A T 
is. The result follows from Theorem 12. 81 □ 
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This theorem shows that the HOTS vector for the irreducible case is well defined 
if A is irreducible and that if A+A T is primitive, then the ideal HOTS algorithm (|3.1[) 
converges linearly to the HOTS vector. 

Remark 1. The ideal HOTS algorithm (Algorithm^ requires a primitivity as- 
sumption in order to converge that coordinate descent DSS (Algorithm^) does not 
require. On the other hand, the convergence rate of coordinate descent DSS is not 
explicitly given while Theorem VSM gives the convergence rate of ideal HOTS. 

Remark 2. Changing the diagonal of A does not change the optimal scaling, so 
we can choose a nonzero diagonal for A in the preceding theorem. This is useful when 
A is irreducible but not primitive. 

The fixed point equation defining the ideal HOTS vector is 

Indeed, the page i has a good HOTS score if it is linked to by pages with a good 
HOTS score and if it does not link to pages with a bad HOTS score. 

We thus introduce the following set of fixed point ranking algorithms. 



Algorithm 4 Deformed HOTS algorithm 



Let a, j3 > such that a + (3 = 1 and let g : R™ — s- R™ defined for all i by 

9i(x) 



Given an initial point do € R™ and a norm ||-||, the deformed HOTS algorithm is 
defined by 

,k+i = 9{d k ) 
\W k )\\ 



Proposition 3.7. Let a, (3 > such that a + \3 = 1. If A is irreducible and 
a A + f3A T is primitive, then the deformed HOTS algorithm (Algorithm^) converges 
linearly to a positive vector. 

Proof. Let h = log ogoexp. As in the proof of Theorem l4.6[ the rate of convergence 
for the fixed point iterations with g or h is the same. The map h is monotone and 
additively homogeneous. For a > 0, its graph is equal to A. Hence, for a > 0, h 
has an eigenvector by Theorem 12.71 For a = 0, as A is irreducible, by the Perron- 
Frobenius theorem 2 , A has an eigenvector x. Then log(x _1 ) is an eigenvector of h. 
Now, Vh(v) = adiag(A 7 V)- 1 A' r diag(e 1 ') + /3diag(Ae- t ')- 1 Adiag(e- t '), so we have 
the convergence as soon as a A + (3A T is primitive by Theorem 12.81 □ 

Remark 3. For a = h, we have the fixed point diagonal similarity scaling, for 
a = 1, we have the ranking by the Perron vector U6T/ and for a = 0, we have an 
"anti-Perron" score, where good pages are those that do not link to pages with a bad 
score. 

The following result gives a global contraction factor in the case when A is positive. 
Proposition 3.8. If k(A) is the contraction factor of A in Hilbert metric 
(k(A) < 1 if A is positive), then f is >+ k ( A > -contracting in Hilbert metric. 
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Proof. Let x and y be two positive vectors such that r\y < x < vy elementwise. 
Then r q'A T y < A T x < v'A T y with \og(v' /rf) < k(A T )\og(v/ri). We also have that 
<q" Ay- 1 < Ax- 1 < yt'Ay- 1 with log((i7") _1 /(V') _1 ) < k(A)log(v/r)). Hence, 

„ , , , ,, , k(A T ) + k(A)^ ,v. 

d{g{x),g{y)) = log(( — ) 1/2 ) < '- ^log(-) . □ 

V V L V 

A key technical ingredient of the convergence of the effective HOTS algorithm 
described in the next section will be Theorem 13.91 below showing that each iteration 
P / (p) °f the ideal HOTS algorithm does not increase the dual objective function. 

Theorem 3.9 (Lyapunov function). 6(f(p)) < 6(p) 

Proof. Let us denote i^{p, q) = J2i j e Pi Aije~ qj . 

^(p,2/(p) -p) = (|iS|) ePj = D^'V 

l ;J J 

= 6( P )=TP(2f(p)-p,p) 

Now, as if) is convex, 

0{f{p))=^{\{2f{p)-p ) p)+ 1 -{p,2f{p)-p))<e{p) . □ 



4. The effective HOTS algorithm. Theorem 13.41 gives conditions for the ex- 
istence and uniqueness of the HOTS vector in the ideal case. In practice the irre- 
ducibility condition does not hold for the web graph. The classical solution for this 
problem is to add a small positive value to the adjacency matrix [H [21] in order to 
get a positive matrix. Tomlin proposed an alternative approach based on the network 
flow model. We consider the following nonlinear network flow problem with network 
given by 

'A ll 



A' 



V 







where 1 denotes the vector with all entries equal to 1. 

P 

■x - > /', ,i u>;t! 

p> 



m ^ x " Y Pi,j( l °s(jr-) - 1) 

i,3€[n+l] *J 

E Pi'i = E Pit ' V * G t n + ^ 
j£[n+l] j£[n+l] 

i,je[n+l] 

2J = 1 - a 

1 - a = 2J 

iG[n] 



(Pi) 

(m) 

(a) 
(«0 



We use the conventions that 01og(0) = and that x\og(x/0) = if and only if x = 0. 
In this new model, we add an artificial node connected to all the other nodes and 
such that the flow through this node is precribed to be 1 — a. 



The algorithm is designed for the minimization of the dual function 9 where 



0(p,M,a,6)= A lje P>-Pi + ^+ e~ b ~ p ^ 1+p ' + » 

i,i£[n] iS[n] 

+ J2 e a+p " +1 - p i +f * - (1 - a)a - fi + (1 - a)6 . (4.1) 

We first give the following counter-example, showing that the problem may be ill 
posed. 

Counter-Example 1. The dual function 9 may be unbounded. 
Proof. Take 

"0 1 0" 

A= 1 . 


We have 

9{ p ) = C(a) + (1 -a)(\og(J2 e Pl ) + e ~ Pi )) + ( 2a ~ X ) log(e Pl ~ P2 + e P2 ~ P3 ) 

»E[n] 

where C(a) e E. For all k e R, 

0(-fc, 0, fc, 0) = C(a) + 2(1 - a) log(l + e k + e~ k ) + (2a - 1) log( e - fe + e - k ) 
= C(a) + 2(1 - a)k - (2a - l)k + 2(1 - a) log(l + e~ fe + e~ 2fe ) + (2a - 1) log(2) 



For a > 4, is unbounded. □ 

This example is indeed rather degenerate: the HOTS algorithm can only diverge 
because it searches the minimum of an unbounded function. Said otherwise, it tries 
to solve a network flow problem without any admissible flow. We shall give conditions 
under which there exists a HOTS vector and show that the HOTS algorithm converges 
to the HOTS vector when these conditions hold. 

Remark 4. A natural idea to establish the convergence of a fixed point algorithm 
is to show that it is a contraction in Hilbert metric. In the case of the effective HOTS 
algorithm, even when the matrix A is positive, the fixed point algorithm may not be a 
contraction (take a perturbation of Counter-example^). 

Lemma 4.1 ([31 ). For any p € R" +1 , the minimum of 9(p, fj,,a,b) with respect 
to p., a and b is unique and given by 



M = log( 



a = log( 



2a- 1 



J2i,je[n] Ai,je p * p J 



) 



1 



E 



A. .pPi-Pi 
i,j£[n] % '3 



b = - log( 



2a- 1 £ 
1 - 



) , 



2a -1 Ei 



e Pi-p n +i 



Proof. The function 9(p, •, •, •) is convex and differentiable so the optimality con- 
dition is just that the gradient is zero. One can easily see that the only triple that 
cancels the gradient is the one given in the lemma. □ 
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We denote A = (fi, a, b) and X(p) the solution of the minimization of 9(p, A) with 
respect to A. For A 6 R 3 , we denote 

/*(p) = i(log( ]T A jti eK + e^+ a ) - log( £ A^e^* + e^"+^ b )) . (4.2) 

j£{n] ke[n] 

We also define g x = exp o/ A o log: 

- 

x f x / Eje[n] A J^yj + ea yn+i V 



T,ke[n] A i,k(Vk) l +C b {yn+l) 1 



Algorithm 5 Effective HOTS algorithm 



Given an initial point y° G R™, the effective HOTS algorithm is defined by 

yk+l = g Hlog(y k )) ,k\ 



Lemma 4.2 ([3]). Let d G R™ such that for all i, d. t > 0. The function 9 
defined by 9(p) = m.in Pri+1 ^ ta ^9((p,p n j r i), ii,a,b) is stricly convex on the hyperplane 
H = {x G R™ | Y] ie r n i diXi =0}. In particular, there exists at most one HOTS vector 
up to an additive constant. 

Proof. From the expressions of a + p n +i, b + p n +i and pL at the optimum 
(Lemma |4. II) . respectively given by e a+Pn+1 = J}~ a ^ e , e~ b ~ Pn+1 = (j,~ a ) e M and 

=ft, we can write 9 as 



9(p) = C(a) + (1 - a)^(-p) + (1 - a)0(p) + (2a - 1) log( £ A^-e^) 

i,iG[n] 

where : x i— » log(^jg[ n ] e3:i ) is the log-sum-exp function, which is strictly convex 
on any hyperplane that does not containing the vector with all entries equal to 1, 
and C(a) = 1 — 2(1 — a) log(l — a) — (2a — 1) log(2a — 1). 9 is the sum of convex 
functions and stricly convex functions, so it is stricly convex on H . We conclude that 
the minimum of 9 on H is unique if it exists. We can then extend this result to the 
whole space since 8(rj + p) — 9(p) for all real number rj. D 

Lemma 4.3. Let d and 9 be as in Lemma \4-£\ The function 9 is coercive on 
the hyperplane {x G R" +1 | J2ie[n+i] ^ iXi = 0} */ an< ^ on ^y */ there exists a primal 
solution with the same pattern as A' . 

Proof. If the function 9 is coercive, there exists a dual solution and thus there 
also exists a primal solution with the same pattern as A' . 

If there exists a primal solution with the same pattern as A', the constraint qual- 
ification conditions are satisfied [27], and there exists a dual solution. By Lemma l4~2l 
9 is strictly convex on the hyperplane {x G R™ +1 | J2ie[ n +i] di x i = 0}- Thus it is 
necessarily coercive on this hyperplane. □ 

Lemma 4.4. If A ^ 0, then for any fixed X, the iterative algorithm consisting 
in successive applications of the map /\ defined in (|4.2|) . converges to a minimizer 
of the function (p H y 9(p,X)). Moreover, this minimizer is unique up to an additive 
constant. 
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Proof. The map / A corresponds to the ideal HOTS fixed point operator (13.11) for 
the matrix 



A e a l 
e b l T 

where 1 denotes the vector with all entries equal to 1. As this matrix is primitive as 
soon as A ^ 0, Theorem 13.61 and Proposition 13.21 apply. □ 

Theorem 4.5. Let F defined by F(p) = f Hp) {p) as in J42]). Let p* be the 
logarithm of a HOTS vector defined by p* — F(p*). The matrix T/F(p*) has all its 
eigenvalues in the real interval (— 1, 1] and the eigenvalue 1 is simple. 

Proof. Let us denote 7 = e a = 7 ^ ! J P . , e~ b = 7^"" 



Then for all k £ [n] , 

F fc (p) = i log(£ ^e* + e a ) - 1 log( £ A kj e~^ + e" 6 ) 

*e[n] j'S[n] 

*6[n] ie[n] 

As no coordinate of F{p) depends on p n +i, we may consider the reduced function that 
we shall still denote F and such that to p 6 W L associates F(p, 0). The eigenvalues 
of the gradient of original function are and the eigenvalues of the gradient of the 
reduced function. 
First, we have 

1 09 
F{p)=p+-log(l-dmg(d) — ) 

where 9{p) = mm Pn+ltfltatb 6((p,p n+1 ), fi, a, b) and d t = Y^^^AuX+e" ) > 

(see [9] for more details on this calculus). Differentiating this equality, we deduce that 
VF = J n — i di&g(d)\7 2 8. Let A be an eigenvalue of VF. This means that there exist 
a vector x such that 

Ax = \7Fx = x ~\ diag(d)V 2 fe 



W 2 9x = 2(1 - A) diag^ 1 )^ 

This is a generalized eigenvalue problem with V 2 symmetric semi-definite positive by 
convexity of 9 and diag(<i -1 ) diagonal definite positive. Hence 2(1 — A) is necessarily 
a nonnegative real number and A is real and smaller than 1. Also, if A = 1, this means 
that x is the vector with all its entries equal to 1 (by Proposition 10 in [9] which is a 
simple extension of Lemma 14.21) and thus A is simple. 

We shall now show that all the eigenvalues of VF(p*) are stricly greater than —1. 
Differentiating the expression of a, we get 

11 



But as p* is a fixed point of F, it satisfies the equality 

53 AuePt-P' +e a '~P' = Aije p *-P'i + e~ b ' +p ' 

iC'n. j£[n] 

which can be rewritten as 

53 Alj e«-* - J] ^e^-rf = 7 £ L e " P ' e - p . - y^f) ■ 

Hence 

de a v-^ * * ( e~ p ' e p * e~ p * \ 
(P*)=7 > A ii e p ^ p i 7 ; 7 rH — 

Let us introduce d! such that 

d'- 1 = 53 A iU e p "- p * + e a "~ p * = 53 A kj e p *- p l + e - b * +p '* . (4.3) 

ie[n] j'e[n] 

Doing the same for e - '' as for e° and differentiating F, we get 

~^{p*) = -d' k A l>k e p '- p l + -d' k A k ,e p *- p * +- 7 ^ A ije p '- p W k x 



i,je[n] 



I e ~Pi~Pk e Pi~Pk e Pk-Pi e Pi+Pk 

[ {1 + 7) ~ 1 T~^T~^ ~ 7 E^E^ + (1 + 7) C^y 

We can now decompose ^ as 

dF 

^- = D'S + D'R 
dp 

where D' = diag(d'), S is a symmetric matrix with nonnegative entries 

f 1 1 .c-^ / e~ Pl ~ Pk e pi+Pk 



Su = \A Kl e p >- p x + ±Ai, h eF>-» + ^J^A^e^ 
and R is the following symmetric rank 1 matrix 



- P]) 2 (EiePl) 2 



The nonnegative matrix D'S verifies that for all k, E; d k Sk,i — 1, thus by the Perron- 
Frobenius theorem [2], we have exhibited a Perron vector and the spectral radius of 
the matrix is 1. Moreover, D'S is positive, so every other of its eigenvalues has a 
modulus strictly smaller than 1. 

The matrix D' R is a rank 1 matrix and its only nonzero eigenvalue is positive. 
Indeed it is equal to ± 7 2 Ei,j A 3 e p ^ E fe (^^7 ~ ^k)H- 
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Let A be an eig envalue of = D'S + D'R. It then also an eigenvalue of the 
similar matrix (D') 1 / 2 S(D') 1 / 2 + (D 1 ) 1 / 2 R(D'f/ 2 , which is symmetric. Hence, 

A> min x T {D') 1 ' 2 S{D') 1 / 2 x + x T (D') l / 2 R(D') 1 ' 2 x 

xeR»:||x|| 2 =l 

As the spectral radius of D'S is 1, the same is true for (D 1 ) 1 / 2 S(D') 1 / 2 and for all 
vector x, x T (D') 1 ^ 2 S(D') 1 ^ 2 x > —\\x\\ 2 . As D'R has only nonnegative eigenvalues, 
{D'f^RiD') 1 / 2 is semi-definite positive and x T (D') 1 / 2 R(D') 1 / 2 x > for all x. As a 
conclusion, 

A> min x T {D') 1 ' 2 S{D') l l 2 x + x T {D') l l 2 R{D') 1 ' 2 x> -I . □ 

xGR™:||a;||2 = l 



Theorem 4.6. Let F{p) = f Hp) (p) as in (|4.2[) . // there exists a primal feasible 
point with the same pattern as A', then the effective HOTS algorithm (Algorithm^ 
converges to a HOTS vector e p (unique up to a multiplicative constant) linearly at a 
rate |A 2 (VF(p*))| = max{|A|; A G spectrum(VF(p*)), A ^ 1}, 

Proof. Let F be the map defined by F(p) = F(p) - £, e[n+1] (d'A^F^), 

with d\ defined in (|4.3I) in the proof of Theorem 14.51 for i G [n] and (c^ +1 ) _1 = 0. 
For all k, let pk be the fcth iterate of the HOTS algorithm, i.e. pt+i — F(p k ), and let 
Afc = \{pk)- We also define qk by go = Po and qk+i = F{ a kj- By Theorem 13.91 and 
by definition of X k+1 , we have 9(p k ,\ k ) > 6(p k+ i,\k) > 0(Pk+i, A fc+ i). As q k - p k 
is proportional to the vector with all entries equal to 1, X(pk) = A(<7fc) = and 
8(<lk, A) = 9{pk, A) for all k and A. Hence 

%fc,A fc ) > %fe+i,A fe ) > 9(q k+1 ,X k+ i) . (4.4) 

Now, for all k, q k G H = {x e W l+1 \ J2 i e[n+i]( d 'i)' 1 ^ = °l- As h Y Lemma |L3l 
6 is coercive on H, 9 is bounded from below and (9(q k ,X k ))k converges to, say, 9. 
Moreover, the sequence (q k )k must be bounded. By continuity of the function A(-), 
(Afc)fc is also bounded. Hence, they have limit points. 

Let q be a limit point of (q k ) k and A = X(q). For all e > and K > 0, there exists 
k > K and k' > k + 1 such that \\q k - q\\ < e, \\q k < - q\\ < e. By (|L4]l . 

9(q k ,X k ) > 9{q k+ i,X k ) > 9(q k+1 ,X(q k+1 )) > 9(q k ',X k >) 

where q k +i = F(qk). When e tends to and K tends to infinity, we get with q = F(q), 

9(q,X)>9(q,X)>9(lX(q))>9(q,X) . 

In particular, 9(q, X) — 9(q, X(q)). This implies by Lemma [4.1l that A G argmin^ 9(q, X) 
and thus A — X(q) by uniqueness of the minimizer. 

Similarly, q is also a limit point of {qk)k and we may consider the sequence {uk)k 
such that u k = (F) k (q) = (f x ) k (q) -i^rE ieW d^ 1 ((f^iq)),. Iterating the 
argument of the proceeding paragraph, for any k, X(uk) = X and Uk is a limit point of 
(<7fc)fc- Now, by Lemma l4~4l the sequence (u k ) converges to q* £ argmin g 9(q, A). As 
we also have X(q*) = A, we conclude that (q* , A) is a minimizer of 9 and that there 
exists a limit point of (q k ,X k ) k that minimizes 9. Now, as (9(qk, X k ))k is decreasing, 
all the limit points of (qk) minimize 9. The uniqueness of the minimizer of 9 on H 
(Lemma 14. 2p gives the convergence of the effective HOTS algorithm to the HOTS 
vector in the projective space, that is the convergence of the sequence [q k ) k - 
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We shall now prove that the sequence (pk)k>o indeed converges. Let us denote by 
p = max{|A|;A G spectrum(VF(g*)), A 7^ 1}. By Theorem 14.51 we know p < 1. De- 
noting (Aj, Ui, Ui)ie[ n +i], the eigenvalues and eigenvectors of VF(q*) = ^i u i v I '> 
we have VF(g') = £^2 Wf + '-^X - ^(d'-^M^T = £^2 Wf • 
Hence by [25], for all e' > 0, there exists a norm ||-|| such that for all x G K™ +1 , 
l|VF(g*)a:|| = Wf»|| < {p + e')\\x\\. 

Thus for a; G R n+1 sufficiently close to q*, 

\\F(x) -q*\\ = \\F(x) F(q*)\\ < (1 + e'/2)\\VF(q*)(x q*)\\ < (p + e')\\x - q*\\ . 
We deduce that (qk) converges linearly at rate p to q* . Now for all k, we have 



Pk= qk + 



fc-i 

E 

2=0 " " ie[n+ll 



= *+E^3ft E rf'r 1 ^(«)-^(9*) = ft + E^ - 

Z=0 «E[n+l] i=0 

where |^| = 0(\\F( qi ) - F(q*)\\) = 0(||gi-g*||) = 0(p<). Hence J2t=o m is summable 
and converges linearly at rate p. Finally, (pk) converges linearly at rate p. Like in the 
proof of Theorem l4.6l we deduce the convergence of the sequence (exp(pfc)) linearly at 
rate p to a HOTS vector. □ 

The last result shows that coordinate descent is an alternative algorithm for the 
computation of the effective HOTS vector. 

Proposition 4.7. // there exists a primal feasible point with the same pattern 
as A' , the coordinate descent algorithm applied to the unrestricted minimization of the 
dual function defined in (]4.1|) and choosing coordinates in a cyclic order converges 
linearly to a HOTS vector. 

Proof. If there exists a primal feasible point with the same pattern as A' , then 
the set of minimizers of 9 is nonempty by Lemmas 14.21 and 14.31 The function has 
the required form with tp(x) = £ i j ^i.j ex p( x i,j)- The hessian of tp is clearly definite 
positive for all x. Thus the hypotheses of Proposition 13.11 are verified and the result 
follows. □ 

5. An exact coordinate descent for the truncated scaling problem. 

Truncated scaling problems were introduced by Schneider in \27\ 128] in order to gen- 
eralize both matrix balancing and row-column equivalence scaling. Given a n x n 
matrix A and bounds L and U such that L^.j < Uij, the truncated scaling problem 
consists in finding a matrix X of the form X = T(D~ 1 AD) with D diagonal definite 
positive, T the truncation operator Tij(X) = max(min(J7i.j ■, X^j), Li.j) and such that 
J2k Xi,k — £j for all i. This problem is equivalent to the following optimization 
problem 

m ^ "E feWy 1 )- 1 ) 

P>0 *■ — ' Ai a 

E = E ft>* ' yi e M (P«) 

je[n] je[n] 

Lij < Pi j < U id , Vi,j G [n] (Vi,j,(i,j) 
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If the bounds satisfy Li j — and U^j — +00 for all i, j, then we have a matrix 
balancing problem. The reduction of row-column equivalence scaling to truncated 
scaling lies in a graph transformation described by Schneider in |27j . Lemma 1. 

The dual function can take two forms. In [27], Schneider proposes to relax the 
equality constraints and to let the bound constraints in the objective function. One 
gets a dual function of the form 

i,je[n] 

where each ip*j : K — >• M is convex. Then one can perform an inexact coordinate 
descent where at each step the minimization along the coordinate is not necessarily 
exact. 

Here, we shall study the choice of relaxing all the constraints. This approach has 
been proposed in [3 2) for another generalization of the row-column equivalence scaling 
problem (but this generalization does not include truncated scaling). Then, we get 
the following dual function: 

j,jS[n] i,je[n] i,jG[n] 

where <fi* .(i) = A^e' '. We shall minimize with unrestricted p and nonnegative 77 
and (. As in [SJ [32], we shall show in Proposition 15.11 that exact expressions of the 
minimizers along one single coordinate exist. 

Proposition 5.1. Given p S M. n , the minimizers of min^>o.f >o 8(p, f], C) are 
given for all i and j by 

expOftj) = max(— %— - . !) 



7 J — 1111111— 



exp(-Cij) = min( 



Proof. The proofs for 7/ and £ are symmetric, so we only do the one for r\. 



Two cases may occur: either = and rji j > or rji j = and J^— > 0. 



Hence, either exp(r?jj) = — p ^'lp j _ c . j and exp(^j) > 1 or we have exp(r]ij) = 1 

and 1 > t-p-t- -■ > L p J - P ■ ■ We thus have the result if r)i j and Q j are not 

positive together. 

Now suppose that rji j > and Q j > 0. In this case, we have exp(j]i t j) = 



A ^ij-i^ and exp(-Ci,i) = A ^rjjj^ . This implies that [7^ = Ljj. Thus 
the two bound constraints are in fact an equality constraint and we shall consider the 
unconstrained multiplier £3 j = 77^ — £jj instead of the two former multipliers. Then 
£ij verifies exp(£jj) = — L ^p i ^ Pj ■ It is unique and can be redecomposed into Cij and 
r]ij as in the proposition. □ 

This proposition shows that we do not need to store the values of rjij and Ci,j- 
Indeed, for fixed p and A, if (r),Q = argmin^'>o,f>o 9(p, A, rf, £'), then if we denote 
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by mid(a;, a, b) — max(min(a;, a), b) 

A itk e p >-^+^« = mid(A itk e Pi ) U i>k e Pk ,L itk e Pk ) 
A^e-Pi-n^+C^ = m id(A kj e-Pi,U ktJ e- p «,L kj e- p *) . 

This gives an expression of coordinate descent with no storage of r/ k+ i nor Cfc+i- 

Algorithm 6 Exact coordinate descent for Truncated scaling 

Given an initial vector p a , calculate iteratively p k by selecting a coordinate / following 
a cyclic rule and computing: p k , +1 = pfi for all I' ^ / and 



1 = ^ lo § f E mid(A,;efsC7 M erf,L M e^) I 

\i£[n] / 

-i log I ^ mid^je-f^f/ije-^Siye-rf) 



Proposition 5.2. //A /ias a truncated scaling, then Algorithm^ converges 
linearly to a solution of the truncated scaling problem. 

Proof. By Proposition 15.11 we can see that Algorithm [5] is a coordinate descent 
algorithm (Algorithm [3]) applied to the minimization of the dual function 9 defined 
in (|5.2I) such that for every I, the coordinate selection order is 

Vl,n, 0,li • ■ • > Q,n,Pl 

As in Proposition 0T71 we then just verify the hypotheses of Proposition 13. II □ 

Remark 5. Due to the truncation, it is not clear how to determine the primitivity 
of the gradient of the fixed point operator of a HOTS-like algorithm for the truncated 
scaling problem. 

Tomlin proposed in [31] to search for a flow of websurfers p that maximizes the 
entropy for the effective HOTS problem (Section [4]) with additional bound constraints 
of the type 

Ui,j < Pi,j < L h3 . (5.3) 

These bound constraints model the fact that one may have information on the actual 
flow of websurfers through some hyperlink, even if the flow on every hyperlink is out 
of reach. 

We propose the following coordinate descent algorithm (Algorithm [7]) that couples 
the algorithms presented in Proposition l4.7l and Algorithm[BJ The proof of convergence 
is just the concatenation of the arguments of Propositions 14.71 and 15.21 

6. Comparison with alternative algorithms. We give in Table lSTTI a compar- 
ison of four algorithms for the matrix balancing problem: interior-reflective Newton 
method (Matlab fminunc function), coordinate descent, DomEig and ideal HOTS. We 
considered a small matrix, a medium size matrix and two large matrices. The matrix 

. 1 = ^ „ , with e = 10~ 3 , is a nearly imprimitive matrix. The CMAP matrix is the 
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Algorithm 7 Coordinate descent for the HOTS problem with bounds 



Start with an initial point y° € M. n , y° > 0. Given y k , select a coordinate i 6 [n + 1] 
and compute y k+1 such that y^ +1 — Uj, Vj : =/= I and 

k+1 = ( E t el n] ™MAiy^Uuy k :L h iy k ) + e ak y k +1 \ ~ 2 

Vl ~ l,E J - e[ „ ] mid(^, i (^)-S^ J (yf)-Si w (yf)- 1 ) + e-^(^ + J-iy 

If i < n+ 1, then set (^ fc+1 , a k+l , b k+1 ) = (^i k ,a k ,b k ), otherwise 
/ +1 = log( - k ) , 

a*" 1 = log( ^ ) , = - log( ' " 





A " 2 




CMAP 
1,500 p. 


NZ Uni 
413,639 p. 


uk2002 
18,520,486 p. 


|A 2 (F)| (Thm.lMJ 


0.999, 




0.8739 


0.9774 


0.998 


Matlab's fminunc 


0.015 s 


948 s 


out of mem. 




DomEig [14 


1.87 s 


34.4 s 


> 600 s 




Coordinate desc. (Alg. [2]) 


0.006 s 


0.03 s 


6.06 s 


2,391 s 


Ideal HOTS (Alg. [TJ> 


2.0 s 


0.02 s 


7.52 s 


1,868 s 



Table 6.1 

Execution times for 4 algorithms to solve the matrix balancing problem. General purpose al- 
gorithms like Newton methods (fminunc) are outperformed by coordinate descent and ideal HOTS 
for this problem. DomEig Ji^f does not seem to be very efficient for these sparse problems. On the 
other hand, coordinate descent and ideal HOTS have similar performances. We remark however that 
coordinate descent has a better behavior for imprimitive matrices but that we have the expression of 
the rate of convergence of ideal HOTS, that we do not have for coordinate descent. 



adjacency matrix of a fragment of the web graph of size 1, 500. The crawl consists of 
the www.cmap.polytechnique.fr website and surrounding pages. The NZ Uni matrix 
comes from a crawl of New Zealand Universities websites, available at [26]. It has 
413,639 pages. The uk2002 matrix comes from a crawl of the .uk name domain with 
18,520,486 pages, gathered by UbiCrawler [3]. For the matrix balancing problem, we 
added to the entries of the adjacency matrices arising from fragments of the web graph 
a small positive constant equal to 1 /n (to guarantee irreducibility of the matrix) . We 
launched our numerical experiments on a personal computer with 4 Intel Xeon CPUs 
at 2.98 GHz and 8 GB RAM. 

Convex optimization solvers, coding algorithms like quasi-Newton, are heavy ma- 
chineries that can reach quadratic convergence and can handle general problems. They 
however need complex algorithms: one should not program them by scratch but use 
robust public codes. Parallel computation is not so trivial and tuning the parameters 
may be difficult. Second order methods also need to compute the Hessian matrix, 
which may be a large dense matrix. 

The DomEig algorithm [T3] consists of 2 loops. The inner loop is the power 
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A = 


e 1 
2 




CMAP 


NZ Uni 


uk2002 


|A 2 (VF)| (Prop.13.8D 


0.884( 




0.946 


0.995 


0.9994 


Coordinate descent (Prop. I4.T|) 


0.0061 s 


0.0613 s 


35.7 s 


3,809 s 


Effective HOTS (Alg. 


0.0217 s 


0.0589 s 


36.5 s 


3,017 s 


PageRank [3] 


0.0216 s 


0.0195 s 


2.9 s 


270 s 



Table 6.2 

Comparison of coordinate descent and HOTS for the effective HOTS problem with a = 0.9. 
Both algorithms seem to perform equally. In particular, unlike PageRank, the convergence rate 
deteriorates with the size of the fragment of the web graph considered. 



method, it has a linear speed of convergence equal to the spectral gap of the matrix 
considered. Most of the computational cost lies in the power iterations, since the 
outer loop is simple. This algorithm is specially designed for matrix balancing and 
accepts no extension. It does not seem to be very efficient for sparse problem, when 
compared to HOTS and coordinate descent. Moreover, the precision required for the 
determination of the principal eigenvalue has a strong impact on the performance of 
the algorithm. 

Schneider's coordinate descent DSS algorithm for the matrix balancing problem is 
a very efficient and scalable algorithm. It remains efficient for non primitive matrices. 

The ideal HOTS algorithm has a linear speed of convergence equal to the spectral 
gap of P. On our experiments, it performs well for medium and large size problems 
but lacks efficiency for imprimitive matrices. For the fragments of the web graph, 
it converged in a bit more iterations than coordinate descent but each iteration is a 
bit simpler. So the execution times are about the same. Its advantage compared to 
coordinate descent is that its implementation only needs left and right matrix- vector 
products and element-wise operations (division and square root). Hence, it does not 
require to compute and store the transpose of the adjacency matrix, which can be 
useful for web scale applications. 

In Table I6.2( we compare coordinate descent and the effective HOTS algorithm 
for the effective HOTS problem. Both algorithms scale well and have comparable 
computational costs. However, we remark that the rate of convergence seems to 
deteriorate when the size of the problem increases. It might be necessary to choose 
smaller values for a for large graphs in order to compensate this phenomenon. In 
the next section, we propose an alternative solution consisting in normalizing the 
adjacency matrix prior to computing the HOTS score, and hence minimizing a relative 
entropy instead of the entropy function. 

In Table 16.31 we give the execution times for the HOTS problem with bounds 

7. A normalized HOTS algorithm for a better convergence rate. The 

previous study has shown two drawbacks of Tomlin's HOTS algorithm. First of all, 
its convergence rate seems to deteriorate when the size of the problem increases. The 
second problem concerns manipulations of the HOTS score: when one single page is 
considered, a very good strategy is to point to no page, and thus make this page a 
dangling node in the web graph. This comes from the relation of HOTS with the anti- 
Perron ranking (Remark [3]). Indeed, the anti- Perron ranking penalizes bad quality 
links but also adding any outlink, even a good quality one, diminishes the score of 
the page where it has been added. 
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A = 


e 1 
2 




CMAP 


NZ Uni 


Inexact coor. descent (Eq. (15.11) and [281) 


{ 


1019 s 


4.28 s 


> 2000 s 


Exact coor. descent (Alg. [7|) 


0.005 s 


0.23 s 


100 s 



Table 6.3 



Execution times for two algorithms to solve the effective HOTS problem where some arcs have 
prescribed bounds on their flow \31( (these bounds are determined at random). Exact coordinate 
descent (Algorithm \7\) is faster than inexact coordinate descent. Indeed we do not need any line 
search for Algorithm^ By comparison with Table [KM we can see that the computational expense 
is multiplied by less than 4 with the bound constraints. 



We now propose a modification of the HOTS algorithm that tackles those two 
issues. In order to stop penalizing the presence of hyperlinks on a page, we normalize 
the adjacency matrix, and thus state the entropy optimization problem as a relative 
entropy optimization problem. Then the Perron ranking reduces to PageRank and 
the normalized anti-Perron ranking does not penalize the number of links any more. 

We also need to address the problem of dangling nodes, in which the normalization 
of the corresponding row of the adjacency matrix is not defined, and the reducibility 
of the adjacency matrix. A possibility is to keep on inspiring from the PageRank and 
consider the Google matrix instead of the normalized adjacency matrix. With this 
choice, dangling pages are considered to point to every page, which implies that they 
have a low rank in the normalized HOTS score, when compared to the rank given by 
PageRank. Instead, we suggest to set Tomlin's effective network model to solve the 
reducibility problem and add another fictitious node that point to every page and is 
pointed to by every dangling page. 

We end up with the following network flow problem where the (n + 2) x (n + 2) 
matrix M is defined by: 

M- = Jl^fc Xi,3<n,Y, k A i<k >l 
hJ \0 if i,j <n,Y, k A,k =0 



M 



%W / 1 
1* 1 



1 



and / is the 0-1 vector such that f, = 1 if and only if i is a dangling node. 



max -£ a ,(lo g (^)-l) (7.1) 

53 = 1 ' = PM ' Vi e + 2 1 

i,je{n+2] je[n+2] je[n+2] 

^2 P^+2,j = 1 - a = ^2 Ph^+2 
je[n] ie[n] 

We call this optimization problem the normalized HOTS problem. The normal- 
ized HOTS algorithm is defined in the same way as the effective HOTS algorithm 
but with M instead of A'. As in Lemma [4.11 we define // = log(— »7 p -v )i 
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CMAP 


NZ Uni 


uk2002 


|A 2 (VF)| (classical HOTS) 


0.946 


0.995 


0.9994 


|A 2 (VF)| (normalized HOTS) 


0.906 


0.988 


0.96 


Execution times for Normalized HOTS 


0.055 s 


46.24 s 


752 s 



Table 7.1 



Performances of the normalized HOTS algorithm presented in Section^7\ The correlation be- 
tween the deterioration of the convergence rate of the algorithm, given by the second eigenvalue of 
the matrix V-F, and the size of the data set does not hold any more. Moreover, on all the tests we 
performed, the convergence rate remained under 0.99. 



a ' = log(^ ^£ 6 £ ^;r + x-^ ), V = -logfe ^y p „ +1 ) and the triple 
X'(p) = 0', a', b'). For A' <= tt 3 , we denote 



g't (y) = 



Algorithm 8 Normalized HOTS algorithm 



Given an initial point y G R™, the effective HOTS algorithm is defined by 

y k+1 =g' X ' Myk) \y k ) 



Proposition 7.1. If there exists a primal feasible point to (17.1[) with the same 
pattern as A' , then the normalized HOTS algorithm converges with a linear rate of 
convergence. 

Proof. In the proof of Theorem 14.61 we did not use the fact that the adjacency 
matrix A is a 0-1 matrix, only that its elements are nonnegative. Hence, the conver- 
gence proof of the effective HOTS algorithm directly applies to the normalized HOTS 
algorithm. □ 

We shall see in Table l7Tl that the convergence properties of this algorithm seem to 
be better that those of the classical HOTS algorithm. Nevertheless, these experimental 
results are still to be validated by other theoretical and numerical studies. 

Acknowledgment. I gratefully thank Stephane Gaubert who advised me to ap- 
ply nonlinear Perron- Frobenius theory to tackle the convergence of HOTS algorithm. 
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